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Abstract 

Many insect species have established long-term synnbiotic relationships with intracellular bacteria. Symbiosis with bacteria has pro- 
vided insects with novel ecological capabilities, which have allowed them colonize previously unexplored niches. Despite its impor- 
tance to the understanding of the emergence of biological complexity, the evolution of symbiotic relationships remains hitherto a 
mystery in evolutionary biology. In this study, we contribute to the investigation of the evolutionary leaps enabled by mutualistic 
symbioses by sequencing the genome of Blattabacterium cuenoti, primary endosymbiont of the omnivorous cockroach Blatta 
orientalis, and one of the most ancient symbiotic associations. We perform comparative analyses between the Blattabacterium 
cuenoti genome and that of previously sequenced endosymbionts, namely those from the omnivorous hosts the Blattella germanica 
(Blattelidae) and Periplaneta americana (Blattidae), and the endosymbionts harbored by two wood-feeding hosts, the subsocial 
cockroach Cryptocercus punctulatus (Cryptocercidae) and the termite Mastotermes darwiniensis (Termitidae). Our study shows a 
remarkable evolutionary stasis of this symbiotic system throughout the evolutionary history of cockroaches and the deepest branching 
termite M. darwiniensis, in terms of not only chromosome architecture but also gene content, as revealed by the striking conservation 
of the Blattabacterium core genome. Importantly, the architecture of central metabolic network inferred from the endosymbiont 
genomes was established very early in Blattabacterium evolutionary history and could be an outcome of the essential role played by 
this endosymbiont in the host's nitrogen economy. 
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Introduction 

Symbiotic associations between eukaryotes and prokaryotes 
are common in nature and have been described in all branches 
of the eukaryotic tree of life (Moya et al. 2008). Insects char- 
acterized by feeding upon unbalanced diets have established 
symbiotic associations with bacteria that provide nutrients 
lacking in the diet. This is the case of essential amino acid 
provision by Buchnera aphldicola, the primary endosymbiont 
of pea aphids feeding on phloem sap (Baumann 2005; 
Shigenobu and Wilson 2011), whereas the same metabolic 
function is performed by a consortium comprising the coprim- 
aries Buchnera aphidicola and Serratia symblotica in the case 
of the cedar aphid (Perez-Brocal et al. 2006; Lamelas et al. 
201 1). However, in insects such as cockroaches, which feed 



on complex diets, the role of the obligatory endosymbiont, 
Blattabacterium cuenoti, was unclear before the genomes of 
Blattella germanica (BBge) (Lopez-Sanchez et al. 2009) and 
Periplaneta americana (BPam) (Sabree et al. 2009) strains 
were sequenced. 

The symbiosis between Blattabacterium and cockroaches 
may have become established between 300 Ma, the age of 
the first fossils of roaches in the Carboniferous, and 140 Ma, 
that is, before the diversification of extant cockroach families 
in the Cretaceous (Clark et al. 2001; Vrsansky et al. 2002; Lo 
et al. 2003). Phylogenetic analyses based on 16S ribosomal 
DNA have located Blattabacterium as members of the class 
Flavobacteria in the phylum Bacteroidetes (Lopez-Sanchez 
et al. 2008). The presence of these endosymbiotic bacteria 
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has been observed in all studied species, with the only known 
exception being that of cockroaches from the genus Nocticola 
(Lo et al. 2007). Moreover, in termites, a sister clade of 
cockroaches, only Mastotermes darwiniensis has retained a 
symbiont (Bandi et al. 1 995). Recently, another two genomes 
of Blattabacterium strains have been sequenced from wood- 
feeding hosts, the subsocial roach Cryptocercus punctulatus 
(BCpu) (Neef et al. 2011), and the termite M. darwiniensis 
(BMda) (Sabree et al. 2012). All these four Blattabacteriunn 
genomes show typical features of other insect primary endo- 
symbionts, such as low GC content, total absence of repeated 
elements, and a reduced genome size compared to their clos- 
est free-living relatives (McCutcheon and Moran 2012). 

Flux balance analyses (FBA) of the reconstructed metabolic 
networks from BBge and BPam strains have highlighted the 
functional equivalence of both networks, despite a prolonged 
divergence time and a few remarkable enzymatic differences, 
such as the absence of the first three steps of the tricarboxylic 
acid cycle (TCA) cycle in BPam (Gonzalez-Domenech et al. 
2012). Analyses would also suggest the endosymbionts play 
a role in the host's nitrogen metabolism (Feldhaar et al. 2007) 
leading researchers to postulate a mechanism for the intrigu- 
ing ammonotelism described in cockroaches (Mullins and 
Cochran 1972, 1976; Cochran 1985). In addition, metabolic 
inferences of Blattabacterium strains from B. germanica 
(BBge) and P. americana (BPam) suggest they are involved in 
supplying essential amino acids and cofactors to the host 
(Lopez-Sanchez et al. 2009; Sabree et al. 2009). Moreover, 
Blattabacterium strains from all four studied species en- 
code urease genes, and endosymbiont-enriched extracts 
from P. americana and B. germanica show urease activity 
(Lopez-Sanchez et al. 2009), thus this activity may be 
essential for nitrogen recycling from urate deposits 
(Gonzalez-Domenech et al. 2012). On the other hand, BCpu 
and BMda strains retain the enzymes required for nitrogen 
metabolism, even though they have lost pathways for the 
synthesis of several amino acids (Neef et al. 2011; Sabree 
etal. 2012). 

In this work, we report on the genome of the 
Blattabacterium from Blatta orientalis (BBor), a sister clade of 
P. americana from the family Blattidae (Kambhampati 1995; 
Inward et al. 2007). Comparative genome analysis of all five 
genomes has shed light on the evolution of the primary en- 
dosymbiont because it became associated with a common 
ancestor of cockroaches and termites. Of particular interest 
is the extraordinary conservation of both genome architecture 
and gene content. 

Materials and Methods 

Cockroach Rearing, Dissections, and DNA Extraction 

Blatta orientalis (Blattaria: Blattidae) were reared at the Cava- 
nilles Institute for Biodiversity and Evolutionary Biology 



(University of Valencia, Spain) in compartments at 25 °C 
and fed with sucrose-enriched dog food. Blatta orientalis 
adults were killed by a 1 5-20 min treatment with ethyl ace- 
tate. The fat body was then separated, and the bacteriocytes 
were extracted as described previously (Gil et al. 2003; 
Lopez-Sanchez et al. 2008). Genomic DNA from the bacter- 
iocytes was obtained following the CTAB method (Murray and 
Thompson 1980). 

Genome Sequencing and Assembly 

DNA was sequenced using the Roche GS-FLX 100 pyrose- 
quencing technology. For half plate, 133,562 reads were ob- 
tained with an average length of 235 bp. Those reads were 
assembled using the Roche Newbler software, obtaining 
1,201 contigs with an average length of 846 bp; 39 of 
those contigs were greater than 500 bp in length and con- 
tained 745,669 nucleotides in 129,647 reads, which implied 
that 97% of reads were included in the 39 biggest contigs. 
Additionally, sequences obtained by the Sanger method were 
used to fill gaps and confirm some uncertainties. The data 
obtained from both methods were combined in a database 
as described by Lopez-Sanchez et al. (2009). The GAP4 soft- 
ware included in the Staden Package (Staden et al. 2000) was 
used for the final assembly. Comparison with other previously 
sequenced strains using the Artemis Comparative tool (Carver 
et al. 2005) was useful for correct orientation of different 
contigs, due to the genomic stability demonstrated by these 
genomes. 

Annotation 

Protein-coding genes (CDS) were predicted by GLIMMER3 
software (Delcher et al. 2007), using Blattabacterium symbi- 
otic strains from 6. germanica and P. americana as training set 
sequences. Then, they were manually curated with the 
genome viewer Artemis (Rutherford et al. 2000). CDS were 
annotated through basic local alignment search tool searches 
(Altschul et al. 1997) against the gene nonredundant Kyoto 
Encyclopedia of Genes and Genomes (KEGG) database 
(vvww.genome.jp, last accessed February 4, 2013) and the 
genomes of the previously annotated Blattabacterium strains. 
Genes from the five Blattabacterium strains were classified 
into Cluster of Orthologous Genes (COG) categories searching 
with BLASTP (e value: 0.001) against the cluster of ortholo- 
gous groups database (Vasudevan et al. 2003), and a heat 
map was generated with the gplots library (Warnes 2011). 
RNA genes were annotated by comparing the genome se- 
quence with RNA databases using the INFERNAL software 
(Nawrocki et al. 2009). 

Similar to previously sequenced Blattabacterium strains, the 
strain from Blatta orientalis does not possess any features de- 
termining replication origin, with the exception of the 
GC-skew, which was determined with the OriginX software 
(Worning et al. 2006). 
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Graphic representation of genome-compared graphs be- 
tween the five strains of Blattabacterium was obtained with 
the genoPlotR paclcage (Guy et al. 2010). 

Identification of Orthologous Genes 

Orthologous genes present in all five Blattabacterium strains 
and the free-living Bacteriodete Flavobacterium psychropfiilum 
were identified using the OrthoMCL algorithm (Chen et al. 
2006). The bacterial genomes used were the five strains of 
Blattabacterium and F. psychrophilum JIP02/86 (accession 
numbers in supplementary table SI, Supplementary Material 
online). The protein-coding genes of all bacteria were com- 
pared, all-against-all using BLASTP, the minimum f value was 
established at 1e - 05, a 70% cutoff and 1 .5 inflation value. 

Construction of Blattabacterium Pan-Genome 

Blattabacterium genomes were retrieved from their respective 
databases (see supplementary table SI, Supplementary Mate- 
rial online). Gene counts are based on orthology (supplemen- 
tary table S2, Supplementary Material online). Pseudogenes 
are considered absent. Visual display of the pan-genome sub- 
spaces was done using the R custom modified 
drawVennDiagram function of package gplots (Warnes 
2011). 

Phylogenetic Analysis 

Amino acid sequences from the orthologous genes present in 
all six genomes, mentioned earlier, were aligned with MAFFT 
(Katoh et al. 2005) and manually concatenated. ProtTest 2.4 
(Abascal et al. 2005) was used to select the best evolutionary 
model for our data. Finally, the maximum likelihood best tree 
was obtained with RAxML (Stamatakis et al. 2005) with 100 
bootstrap replicate and using the PROTGAMMA algorithm. 

Testing the Molecular Clock Hypothesis 

Nucleotide alignments from the protein-coding genes in the 
core from all five Blattabacterium strains were obtained from 
the MAFFT (Katoh et al. 2005) aligned protein sequences with 
the program Tranalign from the EMBOSS package (Rice et al. 
2000). Maximum likelihood models are known to be robust to 
violation of the model, including divergence times and satu- 
ration. However, because of the large distances in the phylog- 
eny and nucleotide biases, we minimized the underestimate 
of divergence levels owing to the problem of saturation of 
nucleotide sites by removing all third codon positions from 
downstream evolutionary sequence analyses. Moreover, first 
and second codon positions account for most of the nonsyn- 
onymous sites in the protein-coding genes, which are often 
subject to selection and are less prone to saturation. The 
best-fit evolutionary model for each alignment was selected 
with thejModeltest2 (Darnba etal. 2012). Finally, a likelihood 
ratio test (LRT) was performed as implemented in the program 



basemi from the PAML package version 4.4 (Yang 2007) using 
for each gene the evolutionary model previously selected. 
Each gene in the core was tested under two models, one 
assuming homogenous rates (rooted model, n-1 branch 
length are estimated) and the other assuming different rates 
for each branch (unrooted model. In -3 branch length 
should be estimated). The LRT compares the differences in 
the log likelihood (/), values for each model (2A/) with a 
distribution with n - 2 degrees of freedom. 

Core genes that do not reject the molecular clock hypoth- 
esis, possess an orthologous gene in the F. psychrophilum 
genome, and do not accumulate more than 2.5 nucleotide 
substitutions per site (supplementary table S3, Supplementary 
Material online) were used as data set to determine the date 
of the split between the strains BBor and BPam, as well as the 
split between the strains BMda and BCpu. Nucleotide align- 
ments were obtained following the aforementioned proce- 
dure, also removing third codon positions. Then best-fit 
model and the molecular clock were tested. The divergence 
time between BBge and the rest of the strains estimated ac- 
cording the fossil record (1 40 Ma, Vrsansky et al. 2002; Sabree 
et al. 2010) was used as a calibration point. 

Results 

Genome Characteristics 

The main features of the five Blattabacterium strains sequen- 
ced are summarized in table 1 . The BBor genome has a total 
size of 638,1 84 bp (with a coverage of 41 X) and is composed 
of a 634,449-bp chromosome and a 3,735-bp plasmid, with a 
GC content of 28.2% and 30.6%, respectively. Gene order is 
the same as in BPam and is highly conserved among all five 
Blattabacterium strains (fig. 1). In total, 620 putative genes 
were identified on the chromosome and seven on the plas- 
mid, distributed as follows: 579 protein-coding genes, one 
operon coding for the three rRNAs, 33 tRNAs, one tmRNA, 
the RNA component for the signal recognition particle, and 
the RNase P. Nine genes were finally annotated as pseudo- 
genes: cysH, cysl, cysG, hemC, IpxP, hemD, dut, as well as the 
genes coding for an hypothetical protein present also in BPam, 
BCpu, and BBge, which is included in the orthology group 
blb_0578 (see supplementary table S2, Supplementary Mate- 
rial online), and for the ABC transporter clustered in the ortho- 
logous group blb_0575. The genes cysH and cysl are both 
involved in sulfate reduction. The genes hemC, hemD, and 
cysG participate in biosynthesis of heme groups, the latter in 
siroheme biosynthesis. The genes IpxP and dut code for a 
thermonuclease family protein and a deoxyuridine triphos- 
phatase, respectively. Finally, seven genes are duplicated in 
this genome: rodA, uvrD, IpdA miaB argD, ppiQ and 
sertl, the first five in all five genomes, whereas ppiC and 
serC have only one copy in BMda and BCpu, respectively. In 
addition, dut has a pseudogenized copy in BBor, whereas 
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Table 1 



General Genomic Features of the Five Sequenced Blattabacterium Strains 



Strain'' 


BBor 


BPam 


BCpu 


BMda 


BBge 


GenBank accession 


CP003605, 


NC_013418.2, 


CP003015.1, 


NC_016146.1, 


NC_01 3454.1, 


number'' 


CP003606 


NC_0.13419 


CP003016.1 


NC_016150.1 


NC_01 5679.1 


Genome size (bp) 


638,184 


640,442 


609,561 


590,554 


640,335 


Plasmids 


1 


1 


1 


1 


1 


Plasmid size (bp) 


3,735 


3,448 


3,816 


3,306 


3,485 


Clnromosome size (bp) 


634,449 


636,994 


605,745 


587,248 


636,850 


GC content (%) 


28.1 


28.2 


23.8 


27.5 


27.1 


Total number of genes'^ 


628 (7) 


621 (4) 


589 (4) 


593 (4) 


631 (4) 


CDSs 


579 


582 


548 


544 


590 


rRNAs 


3 


3 


3 


3 


3 


tRNAs 


33 


33 


32 


34 


34 


Other ncRNAs 


3 


3 


3 


3 


3 


Duplicated genes 


7 


8 


7 


7 


9 


Pseudogenes 


9 


6 


3 


9 


1 



^BBor, Blattabacterium from Blatta orientalis; BPam, Blattabacterium from Periplaneta americana; BBge, Blattabacterium from Blattella germanica; BCpu, 
Blattabacterium from Cryptocercus punctulatus, and BMda for Blattabacterium from Mastotermes darwiniensis. 
''Accession number for chromosome above, accession number for plasmid below. 
'^Number in parenthesis reflects the number of genes coded in the plasmid. 




Fig. 1. — Gene order comparison between all Blattabacterium strains. Red shows CDS located in the leading strand, and orange indicates CDS coded on 
lagging strand. Lines between genomes connect orthologous genes in green if genes are in the same orientation, in blue if they are inverted. In this case, the 
first gene in all strains is pdxJ. 



hemD maintains only one intact copy (but different paralogs) 
in all five genomes (supplementary table S4, Supplementary 
Material online). 

Pan-Genome Reconstruction and Functional Profiles 

The pan-genome for all five B/attabacfer/tvm-sequenced ge- 
nomes possess 612 CDS, one ribosomal operon (the three 



rRNA genes, namely 16S, 23S, and 5S rRNA), 34 tRNA 
genes, and three ncRNA genes (fig. 2). The core shared by 
all strains accounts for 504 CDS, three rRNAs, 31 tRNA genes, 
and three ncRNAs. The accessory genome comprises 108 
protein-coding genes and three tRNA genes, distributed as 
follows: 16 genes are strain specific, 13 are shared by two 
strains, 39 by three strains, and finally 43 by four strains 
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Fig. 2. — Venn diagram showing the genes encoded by each Blattabacterium strain. Tlie core genes are tliose located at the intersection of five circles. 



(fig. 2). Of the tRNA genes, one codes for proline (lost in BPam 
and BCpu; anticodon GGG), one for arginine (lost in BCpu; 
anticodon CCG), and one for valine (lost in BBor; anticodon 
TAG). It is worth mentioning that two annotated CDS embed- 
ded in the 23S rRNA gene in BMda (Sabree et al. 2012), 
namely MADAR_308 (cell wall-associated hydrolase) and 
MADAR_309 (hypothetical protein) (blb_589 and blb_611, 
respectively, in supplementary table S2, Supplementary Mate- 
rial online), have been removed from the accessory genome 
because they are false positive as the result of genome mis- 
annotations as shown by Tripp et al. (201 1). 

Protein-coding genes have been classified in COG catego- 
ries for all five Blattabacterium strains (supplementary table S2, 
Supplementary Material online). According to the Kruskal- 
Wallis nonparametric test (x^ = 0.2244, df = 4, P value = 
0.9942), there were not any statistical differences in COG 
distribution among the five strains. Notwithstanding, a clus- 
tering diagram, with the functional profile of F. psychrophilum 
as outgroup, shows the functional profile of BBor is closer to 
that observed in the pan-genome and the other two omniv- 
orous strains, BBge and BPam, whereas the strains from the 
two wood-feeding species, Cpu and Mda, cluster with the 
core genome (fig. 3). The most represented functional cate- 
gory in the endosymbiont genomes are genes involved in 
translation (J), accounting for 20% of all genes. The second 
most represented functional category, especially in the omniv- 
orous species, contains genes involved in amino acid transport 



and metabolism (E) (13%) and shows the main gap between 
the wood-feeding species and the rest, because it harbors 
most of the gene losses described in these two strains (Neef 
etal. 2011; Sabree et al. 2012). 

Differential Gene Losses and Evolutionary History of 
Blattabacterium Genomes 

To obtain a reliable topology of the different Blattabacterium 
lineages and reconstruct the chronology of gene losses, a 
phylogenetic reconstruction was performed with the five 
Blattabacterium strains and the free-living bacterium F. psy- 
chrophilum as outgroup. The 454 protein-coding genes found 
to be orthologous among Blattabacterium strains and possess- 
ing orthologs in the genome of F psychrophilum were used 
for the analysis, giving rise to a concatenated sequence of 
aligned proteins with 173,523 sites. The best evolutionary 
model for our data set, estimated with ProtTest (Abascal 
et al. 2005), was CpREV-n G + E. The resulting phylogeny sit- 
uates the two Blattidae endosymbionts as a sister clade to the 
one formed by BCpu and BMda, placing the BBge strain as the 
most basal lineage, thus corroborating previous analyses of 
host genes (Inward et al. 2007) (supplementary fig. SI, 
Supplementary Material online). 

The phylogenetic reconstruction of the five Blattabacterium 
lineages, the corresponding pan-genome, and the set of re- 
tained genes in each genome were used to establish the evo- 
lutionary history of gene losses, basically following the same 
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Fig. 3. — Heat map comparison of COG frequency profiles among 
different Blattabacterium strains witli tlieir pan-genome and core and 
tlie free-living Bacteroidete F. psychrophilum. J, translation; K, transcrip- 
tion; L, replication, recombination, and repair; D, cell cycle control; M, cell/ 
wall membrane biogenesis; N, cell motility; 0, posttranslational modifica- 
tion, protein turnover, chaperones; P, inorganic ion transport and metab- 
olism; T, signal transduction mechanism; U, intracellular trafficking and 
secretion; V, defense mechanism; V, defense mechanism; C, energy pro- 
duction and conversion; E, amino acid transport and metabolism; F, nu- 
cleotide transport and metabolism; G, carbohydrate transport and 
metabolism; H, coenzyme transport and metabolism; I, lipid transport 
and metabolism; Q, secondary metabolites biosynthesis, transport, and 
catabolism; R, general function prediction only; and S, function unknown. 



Strategy as previously reported (Lamelas, Gosalbes, et al. 
201 1). The underlying assumption to our analyses is that the 
endosynnbiotic genomes have experienced no gene gain, thus 
the retained genes are only vertically inherited. Although this 
assumption is legitimate (e.g., endosymbiotic bacteria are 
housed in bacteriocytes and lack complete set of recombina- 
tion genes), we nevertheless tested for possible events of 
horizontal transfer of genes from other bacteria to the endo- 
symbiont and we found no evidence supporting such events 
(supplementary results, Supplementary Matenal online). In 
consequence, the status of each ancestral gene in each 



extant Blattabacterium genome was evaluated: unique or con- 
vergent losses and active state or pseudogenized sequence 
(table 2). Unique losses correspond to those affecting one 
specific strain or occurring before divergence of related strains 
(i.e., the loss took place in their most recent common ances- 
tor). When unrelated lineages show specific pseudogenized 
sequences or the absence of a gene, we assume that conver- 
gent loss took place in those lineages. During the evolution of 
Blattabacterium, a total of 108 CDS and three tRNA genes 
have been lost in a total of 161 independent events, 77 of 
which were unique losses and 34 convergent losses, 18 genes 
have been lost twice, and another 1 6 have been lost three 
times (table 2). The number of specific losses in each lineage is 
indicated in figure 4, and a detailed description of all the gene 
losses undergone during Blattabacterium strain diversification 
is provided in supplementary results. Supplementary Material 
online. We would like to remark that some of the losses might 
be related to specific events during cockroach evolution, such 
as those that occurred before the split between the two 
Blattidae and the two wood-feeding lineages. Hence, the 
loss of the three first steps of the Krebs cycle may have 
taken place in the common ancestor of BPam and BBor, 
whereas the ancestor of BMda and BCpu lost the ability to 
synthesize several essential and nonessential amino acids 
(fig. 4). Also, it is worth mentioning that two convergent 
changes in sulfur source for metabolism, from sulfate to sul- 
fide, have occurred in the ancestor of BPam and BBor and in 
the BCpu lineage (fig. 5). 

Dating the Split Times 

LRT supported the molecular clock-like evolution for 313 core 
CDS out of a total of 504 CDS. No functional COG category 
was found particularly enriched for genes evolving under the 
molecular clock hypothesis (x^ test, P value = 0.991). In this 
set of 313 core CDS, 290 had an ortholog in F. pshychophi- 
lum, of which for 281, the molecular clock hypothesis could 
not be rejected, and of the 281, 246 show less than 2.5 nu- 
cleotide substitutions per site. Taking into account that the 
split of B. germanica from the rest of roaches was dated in 
140 Ma, the divergence times between BPam and BBor, and 
between BMda and BCpu were estimated as 12.9 ±8.4 and 
88.6 ± 19.8 Ma, respectively. 

Discussion 

The genome sequence of BBor has confirmed the extreme 
stability of the Blattabacterium genome architecture despite 
the fact extant cockroach families appeared more than 
140 Ma (Sabree et al. 2010, 2012; Neef et al. 2011). The 
only rearrangements are a 20 kb inversion, which occurred 
in the Blattidae family lineage, and two inversions in the 
strain BMda, one of 242 kb and another of 2.9 kb, containing 
the genes rffH, dut, and wzxC (fig. 1). As in other endosym- 
biotic bacteria, synteny is highly conserved among the 
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Table 2 

Classification of Gene Losses in Blattabacterium Strains 



Losses 


BBor 


BPam 


BMda 


BCpu 


BBge 


n 


Genes 


Unique loss (77) 










+ 


4 


sirBC, BLBBGE_195, BLBBGE_159, BLBBGE_595 




+ 


+ 


+ 


+ 




1 


blb_0523 








+ 


+ 


+ 


4 


led, acnA, gItA, blb_0575 






+ 


+ 


+ 


+ 


4 


dut, blb_0543, blb_544, tRNA-Val 




+ 




+ 


+ 


+ 


1 


secf 




+ 


+ 






+ 


26 


ilvA ilvB, ilvC ilvD, ilvH, leuA, leuQ leuD, leuB, trpE, trpG, 
trpD, trpC trpB, trpA, thrB, thrC, lysA. argH, metC, ygfA, 
asnC ung, blb_0546, blb_0547, blb_0548 




+ 


+ 


+ 




+ 


15 


metE, metF, cysE, cysK, serQ tgt, luxE, dpB, mdIA msbA, 
blb_0537, blb_0507, blb_0536, blb_0542, tRNA-Arg 




+ 


+ 




+ 


+ 


22 


ubiE, ribD, nadD, mvaKI, mvaD, efp, truB, trmH, nth, 
lolE hemD, ppiC, dpX, dksA secDF, era, blb_0525, 
blb_0518, blb_0522, blb_0525, blb_0529, blb_0541 



Convergent losses (34) 
Twice 







+ 


+ 




2 


ywrO, blb_0586 






+ 




+ 


5 


cysD, cysN, cysl, hemD, blb_0595, 


+ 








+ 


1 


blb_0591 




+ 






+ 


1 


blb_0596 


+ 


+ 


+ 






1 


desA 


+ 


+ 




+ 




1 


blb_0565 




+ 


+ 


+ 




1 


ccoS 




+ 


+ 




+ 


3 


cysG, cysH, hemC 




+ 




+ 


+ 


1 


blb_0578 


+ 






+ 


+ 


1 


lolD 


+ 




+ 




+ 


1 


tRNA-Pro 



3-fold 



- + + - 

+ - + - 

+ - - + 



4 BLBBOR_p001, BLBBOR_p002, BLBBOR_p007, BLBBOR_609 
2 frpF/ BPLAN_099 

1 MADAR_453 

5 BLBCPU_006, BLBCPU_149, BLBCPU_186, BLBCPU_463, BLBCPU_511 
1 IpxP 

1 blb_0585 

2 finpA blb_0587 



Note. — Strain abbreviations as in table 1. +, gene present; gene absent or pseudogene. For genes annotated as hypothetical protein, the number of the orthologous 
cluster in which the gene is classified is indicated by the code blb_ followed by a number (see supplementary table S2, Supplementary Material online). In the case the 
hypothetical protein is present in only one strain, the locus tag of the GenBank file is indicated. 

*This gene is fused to trpB in all other strains. 



different strains; however, surprisingly, Blattabacterium has 
also maintained a very similar gene content, given 83.0% of 
the pan-genome genes are represented in the core (table 3). 
This conservation is even more striking when only the strains 
from omnivorous cockroaches are considered: 93.9% of the 
pan-genome genes are included in the core. Another example 
of endosymbiotic bacteria of omnivorous hosts is Blochmannia 
sp., primary endosymbiont of Camponotus sp. ants. In this 
case also, the number of genes is well conserved because 
93.5% of the pan-genome genes are in the core (Williams 
and Wernegreen 2010). However, the divergence time 
among the three Blochmannia strains has been established 
at approximately 20 Ma (Degnan et al. 2004; Gomez-Valero 
et al. 2008), whereas the symbiosis between cockroaches and 
Blattabacterium originated at least 140 Ma (Clark et al. 2001; 
Lo et al. 2003). Comparison with a similarly ancient symbiotic 



association, similar to the one established between Buchnera 
and aphids between 86 and 164 Ma (von Dohlen and Moran 
2000), and considering only true primary endosymbionts, that 
is, those without any known metabolic complementation with 
other symbionts (as is the case of the cedar and tuja aphids), 
the genetic conservation is much lower, because several gene 
losses have occurred in the different lineages. In particular, 
only 74% of the pan-genome genes are present in the core 
(table 3). All these data suggest that massive gene losses may 
have occurred in Blattabacterium genomes soon after the 
transition from the free-living state to the intracellular life 
style, establishing an optimal genome to fulfill the host re- 
quirements, albeit minimized. We postulate that this gene 
content stasis may be correlated with the role played by 
Blattabacterium in the nitrogen metabolism in cockroaches 
(Gonzalez-Domenech et al. 2012). In fact, gene losses 
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3 rRNA, 33 tRNA, 3 ncRNA) 

582 genes (542 CDS, 
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BBge 3 rRNA, 34 tRNA, 3 ncRNA) 



_l I I I 

150 100 50 0 Mya 

Fig. 4. — Gene loss during Blattabacterium diversification. Number of genes in eacli strain is indicated. Numbers above the branches indicate gene loss 
events. Host family names are indicated on each branch. Abbreviations of Blattabacterium strains as in table 1. 
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Fig. 5. — Sulfate assimilatory pathway genes in the different Blattabac- 
terium sp. strains. +, gene present; -, gene absent; v|;, pseudogene. 
Abbreviations of Blattabacterium strains as in table 1 . 



observed in the Blattidae lineage only affect metabolism in 
peripheral activities (i.e., the change from sulfate to sulfide 
as sulfur source) or irrelevant metabolic steps in the Krebs 
cycle. Hence, one of the most remarkable losses in BBor is 
the absence of genes coding for the three first steps of the 
TCA cycle, citrate synthase (gltA), aconitate hydratase (aciiA), 
and isocitrate dehydrogenase (;cd). Nevertheless, this meta- 
bolic feature shared with BPam (Sabree et al 2009) has no 



effect on metabolic network functionality as shown by FBA 
(Gonzalez-Domenech et al. 2012). 

Most of the core genes in the Blattabacterium pan-genome 
(62.1 %) follow a molecular clock, which allowed us to deter- 
mine the split times between the Blattidae lineages and the 
Termitidae and Cryptocercidae lineages (fig. 4 and supple- 
mentary fig. SI, Supplementary Material online). The diver- 
gence time calculated for the wood-feeding lineages is closer 
to the split between B. germanica and the rest of the 
cockroach species (88.6 vs. 140 Ma). This indicates that the 
metabolic changes taking place in the ancestor of the endo- 
symbionts of M. darwiniensis and C. punctulatus, mainly af- 
fecting the biosynthesis of essential and nonessential amino 
acids (fig. 4), took place very early in the evolutionary history of 
the lineage leading to Termitidiae and Cryptocercidae. 
Conversely, Blatta orientalis and P. americana are much 
more recent lineages (split occurring 12.9 Ma) and, as previ- 
ously stated, the metabolic profile of their common ancestor 
indicates notable metabolic stasis. 

Metabolic comparison among Blattabacterium strains high- 
lights the role of the bacterial metabolic network in host phys- 
iology. For instance, in the case of nitrogen metabolism, 
urease corresponds to the Blattabacterium core genome, sug- 
gesting that the role of the endosymbiont in urate mobiliza- 
tion is ancestral and conserved in all studied Blattabacterium 
lineages. Hence, the combination of urease with the urea 
cycle (interrupted only in BCpu and BMda by the absence of 
argininosuccinate lyase [ASL]) is a metabolic network with the 
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Table 3 

Number of Genes in tlie Pan-Genome and tlie Core in Blattabacterium, Blochmannia sp., and Buchnera 



Genes in Pan-Genome Genes in Core (%)" 



Blattabacterium 

Strains: BBge, BPam, BBor, BCpu, and BMda 652 541 (83.0) 

Onmivorous strains: BBge, BPam, and BBor 644 605 (93.9) 

Blochmannia 

Strains: Bfl, Bpen, and Bva 660 617 (93.5) 

Buchnera 

Strains: BAp, BKo, BUa, BSg, and BBp 650 481 (74.0) 

Note. — Strain abbreviations are as follows. Blochmannia strains: Bfl, Blochmannia floridanus; Bpen, 6. pennsylvanicus; Bva, 6. vafer. Buchnera strains: BAp, Buchnera from 
Acyrtosiphon pisum; BKo, Buchnera from A l<ondoi; BUa, Buchnera from Uroleucon ambrosiae; BSg, Buchnera from Schizaphis graminum; BBp; Buchnera from Baizongia 
pistaciae. 

''Proportion of the pan-genome present in the core genome. 



potential to catabolize nitrogen compounds and generate am- 
monia (Lopez-Sanchez et al. 2009; Gonzalez-Domenech et al. 
2012). Thus, there is a biochemical explanation for the classical 
model of urate deposits acting as nitrogen storage in the cock- 
roach fat body (Mullins and Cochran 1974; Cochran et al. 
1979; Cochran 1985) and the intriguing ammonotelism dis- 
played by these insects (Gonzalez-Domenech et al. 2012). 

With respect to gene losses in the two wood-feeding hosts, 
a higher number of events have occurred although very early 
in their evolutionary history. Remarkably, these affected the 
biosynthesis of essential and nonessential amino acids before 
the split between Cryptocercidae and Termitidae, and the 
convergent loss of sulfate assimilation ability in the BCpu lin- 
eage. As stated earlier, the symbionts from the three omniv- 
orous cockroaches are self-sufficient in terms of supplying 
their host with essential amino acids, whereas up to six path- 
ways for essential amino acids have been lost in the 
wood-feeding hosts. On the basis of the phylogenetic analysis, 
we propose that a major reduction in amino acid biosynthetic 
ability took place in the common ancestor of the two 
wood-feeding lineages. It has been postulated that this met- 
abolic impairment of the endosymbiont could have been com- 
pensated for by amino acid supply from the diet and/or their 
syntheses by gut microbiota (Neef et al. 201 1; Sabree et al. 

2012) , comparable with the metabolic complementation ob- 
served in some bacterial consortia, like Buchnera and Serratia 
in cedar aphids (Lamelas et al. 2011). Additionally, the gene 
argH coding for ASL, the last step in the Arg biosynthesis 
pathway, has been lost in both BCpu and BMda strains, 
thus a host-encoded ASL may also have taken over Arg bio- 
synthesis; nonetheless, the contribution of diet or gut micro- 
biota cannot be ruled out. In fact, 8 of the 10 complete 
genomes from insects accessible in the KEGG database 
(http://vvvvw.genome.jp/kegg/, last accessed February 4, 

2013) contain genes for ASL. The corresponding gene is 
absent from the pea aphid Acyrthosiphon pisum genome 
(but present in its primary endosymbiont Buchnera aphidicola) 
nor is it present in the human body louse Pediculus humanus 
genome (in this case, Arg may be supplied by the blood in- 
gested by the louse). Studies of metabolic modeling supported 



by transcriptomic and proteomic data (Macdonald et al. 2012) 
have revealed the action of host cell enzymes at the end of 
amino acid biosynthesis in the pea aphid and Buchnera aphi- 
dicola, which could have important regulatory consequences. 

There are some remarkable convergent metabolic traits, in 
particular, the capacity for assimilating sulfate. The symbionts 
BBge and BMda possess the genes coding for all enzymes 
involved in sulfate assimilation, with the exception of 5'-phos- 
phosulfate kinase (encoded by cysQ. Experimental data indi- 
cate that this pathway may be operative, because 
aposymbiotic individuals of B. germanica are unable to incor- 
porate sulfate into cysteine and methionine (Block and Henry 
1962). As this pathway is absent in the other Blattabacterium 
strains, it was most likely lost in at least two independent 
events, one in the lineage leading to BCpu and the other 
during the evolution of the endosymbionts of Blattidae species 
(fig. 5). The genes cysN, cysD, and cysl are found as pseudo- 
genes in BPam (Sabree et al. 2009), whereas in BBor, they 
have been completely lost. With respect to the remaining 
genes of this pathway, cysH and cysG are pseudogenized in 
BBor but seem to be functional in BPam. However, gene cys7 is 
present in all sequenced Blattabacterium strains, even in those 
which are unable to assimilate sulfate. The product of this 
gene is a flavoprotein, which forms sulfite reductase with 
the hemoprotein coded by cysl. In this case, cysJ may have 
been recruited by other processes, because it has previously 
been described to work as FMN reductase (Coves et al. 1 993). 

The genomes of BBor, BCpu, and BMda retain seven of the 
nine duplicated genes found in BBge (Lopez-Sanchez et al. 
2009; Neef et al. 2011; Sabree et al. 2012), whereas eight 
of these genes are also maintained in BPam (Sabree et al. 
2009). The presence of these duplicated genes is quite sur- 
prising in the context of a reduced genome, similar to that of 
Blattabacterium and other primary endosymbionts. The fact 
most of these genes are still present in all three strains points 
to their possible functional role in bacterium physiology and 
that these genes might be ecoparalogs (Sanchez-Perez et al. 
2008). 

In summary, the hosts harboring Blattabacterium strains 
have been evolving separately for a very long time; however. 
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the genomic and metabolic architecture of these symbiotic 
bacteria remains strikingly stable. Thus, the basic genetic 
and metabolic traits of Blattabacterium were established in a 
short time period, when these bacteria infected the common 
ancestor of all cockroaches and termites. 

Supplementary Material 

Supplementary results, figure SI, and tables S1-S5 are avail- 
able at Genome Biology and Evolution online (http://www. 
gbe.oxfordjournals.org/). 
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